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In the past couple of years several studies have shown that hybridization in Affymetrix DNA 
microarrays can be rather well understood on the basis of simple models of physical chemistry. In 
the majority of the cases a Langmuir isotherm was used to fit experimental data. Although there is 
a general consensus about this approach, some discrepancies between different studies are evident. 
For instance, some authors have fitted the hybridization affinities from the microarray fluorescent 
intensities, while others used affinities obtained from melting experiments in solution. The former 
approach yields fitted affinities that at first sight are only partially consistent with solution values. 
In this paper we show that this discrepancy exists only superficially: a sufficiently complete model 
provides effective affinities which are fully consistent with those fitted to experimental data. This 
link provides new insight on the relevant processes underlying the functioning of DNA microarrays. 

PACS numbers: 87.15.-v,82.39.Pj 



I. INTRODUCTION 

In all living cells the genes are transcribed, i.e., copied 
into messenger RNA (mRNA), at different rates pj. 
These rates depend on the type of cell, on the stage 
of the cell life cycle and on other external stimuli, like 
changes of pH, temperature or on the presence of chem- 
icals. The abundance of a specific mRNA defines the so- 
called gene expression level. It is of central importance to 
understand when, in which tissue and in which amount 
a given gene is expressed. This knowledge is for instance 
crucial in understanding several diseases that originate 
from deregulations in the gene transcription process, i.e., 
those pathologies triggered by genes which are overex- 
pressed or underexpressed. 

DNA microarrays have become pivotal devices in 
molecular biology as they allow a genome-wide screening 
of gene expression levels in a single experiment. Both 
commercial and home made microarrays are nowadays 
available. One of the leading companies in the DNA- 
microarray market is Affymetrix, which produces high- 
density oligonucleotide microarrays Q. In Affymetrix 
arrays, photolitographic techniques are used to grow on 
a solid substrate single-stranded DNA sequences which 
are 25 nucleotides long; these are normally referred to as 
probes. The array is placed in contact with a solution con- 
taining RNA molecules, i.e., the targets, extracted from 
biological samples. Those targets that are complemen- 
tary to probe sequences tend to bind to these, a process 
known as hybridization. Biotin molecules are attached to 



a fraction of the nucleotides in the target sequences. Once 
hybridization has occurred and the unbound targets are 
washed away, streptavidin molecules, which carry fluo- 
rescent labels, are added to the solution. The latter bind 
with high affinity to the biotin so that the amount of 
hybridized probe-target duplexes can be determined ex- 
perimentally by optical measurements. 

Two specific aspects of Affymetrix arrays are: 1) Sev- 
eral probes are complementary to the same target 
molecule (these probes form the so-called probe set) and 
2) Each perfect matching (PM) probe has a partner probe 
which differs by a single nucleotide in the middle posi- 
tion, the so-called mismatch (MM) probe. The use of 
multiple probes for the same target RNA increases the 
reliability of the determination of gene expression levels 
in Affymetrix arrays, which are obtained from simulta- 
neous measurements of several fluorescent signals. The 
signals measured from MM probes can be used as test 
for the quality of the hybridization experiment. Usually, 
one expects that PM probes give a stronger signal than 
the corresponding MM probes. However, "bright mis- 
matches", i.e., higher signals from MM than PM probes, 
are observed quite frequently Q. 

The hybridization of complementary strands in solu- 
tion, or the reverse process of DNA/RNA melting, has 
been widely investigated in the past years Measure- 
ments of melting temperatures of short oligonucleotides 
have yielded estimates of the enthalpy and entropy dif- 
ferences AH and AS between a double helix and the 
two separate strands. It turns out that AH and AS 
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TABLE I: The stacking free energy parameters AG for 
RNA/DNA hybrids measured in solution at a salt concen- 
tration of 1 M NaCl and at 45° 0. The upper strand is RNA 
(with orientation 5'-3') and lower strand DNA (orientation 
3'-5'). The helix initiation energy is AG init = 3.14 kcal/mole. 



Sequence —AG (kcal/mole) 


Sequence —AG (kcal/mole) 


dXT 0.83 

rAG -, g 2 
dTC 1 - UZ 

rCA Q 7 Q 
dGT U -' U 

rCG i 09 
dGC L - 3Z 

rGA 1 21 
dCT ±,Z1 

rGG 9 fit - 

dCC z - 00 

rUA Q 42 
dAT u,< " 

rUG 07 

dAC 


rAC 1 qq 

dTG aa 

rAU Q 7 Q 
dTA U -' U 

rCC -, q9 
dGG 1 - 9Z 

rCU q 7 o 
dGA U -' J 

dcg 2.56 
d?A 0.93 
To 1-31 

dAA -0.08 



can be well approximated by a sum over local terms de- 
pending on pairs of neighboring nucleotides, plus even- 
tual boundary terms. This defines the so-called nearest- 
neighbor model 0] . Table ^ gives an example of nearest- 
neighbor free energy parameters obtained from measure- 
ments of melting temperatures of DNA/RNA duplexes in 
solution. The free energy differences are obtained from 
AG = AH — TAS, assuming that the experimentally 
measured AH and AS are temperature independent. 

The hybridization process in microarrays is not identi- 
cal to that in solution, as one of the two strands is surface- 
bound. A review of recent work on the hybridization on 
surface immobilized DNA [(| shows that the rate con- 
stants for hybridization are lower than those predicted 
by the nearest neighbor model in solution. The com- 
parison was done with experiments with a single species 
target and probes of equal length 0, H 0. 

Several studies |1 El El El El El E3 recently 
discussed the role of the Langmuir isotherm and vari- 
ants thereof in connection with DNA microarrays. Re- 
search toward a physics-based modeling of hybridization 
in Affymetrix arrays can roughly be divided into two 
approaches. The first approach is to identify empirical 
functions with many degrees of freedom, that are fit- 
ted to experimental data [3|, Il6| . The other approach is 
molecular-based and employs the hybridization energies 
in solution; it then requires a rescaling of parameters like 
the effective temperature 0, 0] . The aim of this paper 
is to link these two apparently different viewpoints. We 
shall show indeed that, when the appropriate quantities 
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FIG. 1: Position-dependent effective affinities fitted from 
Affymetrix data of the HGU95A chipset. Three different 
background values are subtracted: Jo = 0, 50 and 100. The 
three topmost curves are the affinities for nucleotides C and 
the three lowest curves for the nucleotides A. The affinities 
for T and G are almost degenerate. 

are compared, i.e. the effective affinities, the two models 
yield fully consistent results. 

This paper is organized as follows: Sec. [n] reanalyzes 
the binding affinities as introduced by Naef and Mag- 
nasco Q and Binder and Preibisch E|| . We carry out a 
sensitivity analysis and show which features are robust 
and which are sensitive. In Sec. lIIII effective affinities are 
calculated using a molecular model based on the binding 
free energies of Sugimoto et aZ.j^] and the extension by 
Carlon and Heim From this model, the influence 

of different additions to the molecular model on the ef- 
fective affinities is calculated and analyzed. Section IIVI 
concludes the paper and summarizes the main results. 

II. EFFECTIVE AFFINITIES FOR 
AFFYMETRIX ARRAYS 

We turn now to the determination of the effective 
affinities from the analysis of Affymetrix data. We follow 
here and further the procedure originally introduced by 
Naef and Magnasco Hand extended more recently by 
Binder and Preibisch 

Naef and Magnasco fit the brightness B of perfect- 
matching probes as a function of their sequence compo- 
sition: 

where I = A,C,G,T is the letter index and i = 1, ... 25 
the position along the probe. Su is a boolean variable 
equal to 1 if the probe sequence has letter I at position 
i and otherwise, and thus An are per-site, per-letter 
affinities. The median of the PM brightnesses [RNA] 
is used in this expression as a surrogate for the RNA 
concentration, which is not known for most Affymetrix 
data. 
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In Affymetrix experiments, the brightness B will sat- 
urate, once the majority of the probes are bound to tar- 
gets. Capturing such saturation requires the use of Lang- 
muir isotherms; the approach above (eq. QJ) neglects 
saturation effects, and hence is only expected to work 
in the so-called Henry regime [l^ signified by bright- 
nesses much lower than the maximal value. Since only 
few probes reach saturation, neglecting saturation is jus- 
tifiable. 

The experimentally measured fluorescence intensity / s 
of a probe with sequence s does not approach zero at zero 
concentration of the matching target: there is a back- 
ground signal, probably due to non-specific binding. To 
take this into consideration, we distinguish two contribu- 
tions to the fluorescence intensity: a constant background 
intensity Iq and the brightness B due to specific binding: 

I.=Io + B, (2) 

in which B is the brightness as in eq. QJ. We tried dif- 
ferent background subtractions schemes in order to test 
the robustness of the data. Fig. ^ shows the position- 
dependent affinities An obtained from fitting the experi- 
mental data to eqs. and J5J for background intensities 
of Jo = 0, 50 and 100 (constant background level). In 
the fit, the distance of the data to the model was min- 
imized in the logarithmic scale. We note that although 
the shape of the fitted position-dependent affinities re- 
main the same in the three cases, the amplitudes vary 
by a factor of 4. In all cases the shape is consistent with 
what was found in Refs. \,i Il6|: the position-dependent 
affinities are approximately symmetrical with respect to 
the central position of the probe (i = 13) and the high- 
est affinity is for nucleotides C and the lowest for A in 
the probe sequence. The affinities for the G and T bases 
are almost degenerate and show less position dependence 
than the affinities for the C and A bases. 

In the case of Iq = we have a rather low signal. This 
is somehow expected as in that case the non-specific part 
of the signal may dominate, which induces a loss of speci- 
ficity. When higher values of Iq are taken, a non-trivial 
signal starts to emerge. As Iq increases, the amplitude 
of the strongest effective affinity increases to 0.2 and 0.4 
for respectively I = 50 and 100. 

In Fig. |2 we plot the fitted affinities An for probe sets 
with an average intensity above 500. This case corre- 
sponds to signals well above the background level and 
thus the results should be weakly dependent of the value 
of Iq chosen, as is indeed the case. 

As mentioned above, using the median of the PM 
brightnesses [RNA] as an estimate for the RNA concen- 
tration is the only thing one can do in the absence of 
knowledge of its true value. Affymetrix, however, per- 
formed a set of experiments in which some target se- 
quences arc added in solution (spiked-in) at a known 
concentration. The results, known as the Latin square 
data set, are publicly available from the Affymetrix web 
site [l9|. We used these data to refit the effective affini- 
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FIG. 2: As in Fig. Q but disregarding all the data for probe 
sets with an average intensity below I = 500. The effective 
affinities are less sensitive to the choice of Io, compared with 
the fits of Fig. [T] 
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FIG. 3: Fit of Spike-in data of the HGU95A microarray us- 
ing eq. Here, we subtract from the intensity the known 
background intensity at zero concentration. 

ties from eq. IjTJl. using the true target concentration c s 
of sequence s, rather than the median of the intensities. 
Due to the large number of parameters, this procedure 
yields typically values of An that are too noisy. To limit 
the number of fitting parameters we therefore have fit- 
ted An only at some fixed positions % = 1, 4, 7, 10 ... 25 
and taken for the other values of i a linear interpolation 
between the two fitted numbers. Note that the Latin 
square set also contains a series of reference intensities 
measured in absence of the transcripts in solution (i.e. 
c s = 0), a procedure that yields a direct estimate of 
the background signal Iq . The position-dependent affini- 
ties obtained from the fitting of the Latin square set are 
shown in Fig. |3| The results, although still somewhat 
noisy, follow the general trend already shown in Figs. Q 
and El 

The fact that the position-dependent affinities are 
lower for G than for C and for A than for T is consistent 
with the hybridization data in solution, as pointed out 
in Ref. [2£j. This apparent "asymmetry" is due to the 
asymmetry between DNA strands of the surface-bound 
probes and the RNA strands of the target molecules in 
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solution. 

The fact that the effective affinities for G and T are 
close is quite surprising, given the clear differences in 
binding free energies in solution; we will argue below 
that this is due to hybridization between RNA target 
molecules in solution. 



III. EFFECTIVE AFFINITIES RESULTING 
FROM MOLECULAR BASED MODELS 



To obtain more insight into the relation between the 
hybridization free energies of Table |U and the effective 
affinities of Refs. |3J, If 6| and which we analyzed in the 
previous section, we extract effective affinities from a 
model which was recently proposed by two of us |l7l . 

This model is based on ideas from Held et al.[V^. As 
it uses as input the binding free energies between DNA 
and RNA strands in solution reported in Tabled we will 
refer to it as the molecular-based model. Additionally, 
it incorporates the effect of binding in solution of RNA 
to RNA in an approximate way, fitted to the intensities 
measured on an Affymetrix chip. The intensity I s of se- 
quence s is assumed to be proportional to the fraction of 
hybridized probes at the surface, described by a Lang- 
muir model. In detail, it is given by [l7| 



a s c s Z s 

Is=I + * In 

I + a s c s Z s 



(3) 



where c s is the total concentration of targets with se- 
quence s in solution, Z s is the partition sum over states 
in which target s is bound to the probe, and a s is the 
fraction of targets in solution which are free, and not 
hybridized in solution. 
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FIG. 4: Schematic picture of a partially hybridized configura- 
tion. The total probe length is k base pairs and we allow for 
k < 25, as the photolitographic process used by Affymetrix 
produces probes which are polydisperse. The target here is 
bound partially from bases m to bases r. We include in the 
calculation the entropy loss AS(m) due to the proximity of 
the target tail and the surface. 



In the model of Ref. [Tjj 

Z s = exp(-/3AG s 



where = 1/ (RT) is the inverse temperature, and AG S 
is the total binding free energy for a perfectly formed 
helix of 25 base pairs between the RNA target and DNA 
probe. This binding free energy is described by 



AG S — J]] SijSi' 
ill' 



i+iAG(7, l') + AG" 



(5) 



As before, Su is a boolean variable equal to I if the 
probe sequence has letter I at position i and other- 
wise. Thus, the sum in eq. (JSJ runs over all 24 stacking 
parameters AG(l,l'), which depend on the identity of 
two neighboring nucleotides I and I' in the surface-bound 
DNA strand. AG mlt represents a helix initiation cost Q. 
For the stacking parameters the model uses RNA/DNA 
free energies given in Table [i] as obtained from exper- 
iments in solution [a. Note that, differently from the 
approach of Refs. (3 and 0, the free energies used 
here are position-independent. In Ref. |17| . the inverse 
temperature /3 in eq (@J is taken as a fitting parameter. 

We stress that in Ref. [f7j the hybridization free en- 
ergy AG = AH - TAS* was taken at T = 37° C, while 
an Affymetrix hybridization experiment is performed at 
T = 45°C, which is the value we consider here (see Ta- 
ble [J. Although the temperature differs by only 8°C, 
the AG's on average differ by about 20%, since AH and 
TAS" are rather close. We took the sequences of the Latin 
square set (25 nucleotides of length) and generated AG 
of each sequence at both temperatures. A plot of AG37 
vs. AG45 shows that the values are narrowly distributed 
along a straight line. This implies that a difference be- 
tween the two choices of parameters can be reabsorbed 
in a rescaling of (5 in eq. (@J . 

Of practical interest is the total concentration c s of 
targets with sequence s. Due to hybridization of single- 
stranded RNA in the solution, the concentration of free 
targets, which can bind to the probes, is lower than the 
total concentration of targets in solution. In the model 
of Ref. , this is taken into account by reducing the 
total concentration c s in solution by a factor of a s given 
by 



(6) 



l + coexp(/3'AG fi )' 



(4) 



where f3' and Co are fitting parameters and AGr is the 
(sequence dependent) RNA/RNA binding free energy for 
duplex formation in solution, taken from Ref. |4j. Note 
that also a s is highly sequence-dependent: CG-rich tar- 
gets will have high affinity to the complementary surface 
bound probes, but will also have a strong tendency to 
hybridize in solution. It has been shown that a unique 
choice (i.e. probe- independent) of the parameters J max , 
/?, P' and Co fits the experimental data well 01 • 

There are many similarities, and also some discrepan- 
cies, between the intensities I s in the Naef and Magnasco 
(NM) approach eq. @ and in the molecular-based model 
eq. P|. The binding free energy in the NM approach 
is captured in the summation on the right-hand-side of 
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FIG. 5: Effective affinities, obtained with the molecular-based 
model, versus position in the probe, for the four different nu- 
cleotides. In panel (a) only the binding energy is taken into 
account; the effective temperature T is 800 K. In Figure(b), 
the hybridization in solution is also taken into account, as in 
the molecular-based model of Ref. ["HI : the resulting effec- 
tive temperature becomes T — 570 K. The effect of using the 
"zipper" and the probe length distribution is shown in Fig- 
ure (c), resulting in an effective temperature of T = 525 K. 
In Figure (d) all effects mentioned in the text are taken into 
account and the effective temperature is reduced to T = 494 
K. 

eq. JQ) , which is very similar to the summation in eq. © 
in the molecular-based model. NM uses a summation 
over single base pairs with position-dependent affinities, 
while the molecular-based model uses (in eq. a sum- 
mation over pairs of base pairs (allowing for stacking en- 
ergies), with a position-independent strength. As we al- 
ready mentioned, NM does not feature saturation, while 
the molecular-based model does through the denomina- 
tor in eq. J2J- Finally, the clear position-dependence in 
the effective affinities obtained with the NM approach is 
not included in the molecular-based model of Ref. 17]. 

A. Extending the molecular-based model 

In this work, we introduce several extensions to the 
latter model. These extensions will cause position- 
dependence in the effective affinities, without ad-hoc 
modifications to the stacking free energy parameters. 
Most of these extensions are related to the fact that both 
target and probe are polydisperse in length, and that the 
duplex can fluctuate and partially unzip. We will first ex- 
plain these extensions, and then discuss their effect later. 

• Unzipping. Besides the configuration in which the 
target is bound to the probe over its full length, 
other configurations occur in which the target cov- 
ers only part of the probe. This is taken into ac- 



count by a "Zipper" -model. As a result, the par- 
tition sum Z s does not only contain a single term 
exp(— (3AG S ), but is a summation over many terms, 
each of which given by eq. © , but in which the in- 
dex i runs from the first bound pair m to the last 
bound pair r > m. This idea is visualized in Fig- 
ure m 

• Probe length dispersity. During the production pro- 
cess of the Affymetrix chips, the probability p g 
that the probe grows by an extra nucleotide is only 
around p g « 90% |2l|. This means that the frac- 
tion of probes which reach the final full length of 
25 nucleotides is -P(25) = (p g ) 25 . The fraction of 
incomplete probes reaching a length I < 25 equals 
P(i) — (p g ) l {l — p g ). We have included the effect 
of probe length dispersity by including these proba- 
bilities in the calculation. The intensity is therefore 
equal to / = Yld=i P{f)ht where // is the Langmuir 
isotherm corresponding to a probe of length I. 

• Non-specific binding. Even in Affymetrix ex- 
periments where no perfect matching targets are 
present, the intensity does not fall well below 0.5% 
of the maximal intensity. We attribute this to non- 
specific binding to the probes. To account for the 
non-specific binding, we include in our model a 
constant sequence-independent background inten- 
sity I . 

• Tail repulsion. The RNA-target molecules often 
extend beyond the 25 base pairs of the probe; the 
average target length is 50 base pairs. The tail 
of the target which sticks out from the base of 
the probe is hindered significantly by the surface 
(see Figure 2J. This causes an entropic repulsion 
between the target and the surface, lowering the 
intensity. The mathematics of this effect is pre- 
sented in Appendix ^ This effect is not sequence- 
dependent and the parameters Z s in eq. © can 
therefore be multiplied by a constant factor Zs ta3J , 
given in eq. (|A4(I . 

• Fluorescent labels. Due to the fact that in the ex- 
periments only the U and C nucleotides can have 
a label, the fluorescence intensity will scale linearly 
with the number of U and C nucleotides, which 
obviously depends on the sequence. We therefore 
multiplied each Langmuir isotherm by for a factor 
2Xuc, i n which Xjjc is the fraction of U and C in 
the target sequence. We assumed that the target is 
simply composed of 25-mers. 



B. Results of the model calculations 

We generated 100,000 different random sequences of 
25 nucleotides each. For each sequence s = 1 . . . 10 5 , we 
also selected a concentration c s , with a minimal value 
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of c m i n — 1 picomolar and c max — 1 nanomolar (the 
typical range of target concentrations in Affymctrix ar- 
rays); the logarithm of these concentrations log(e s ) is 
drawn from a uniform distribution [log(c m i„), \og(c max )]. 
For each sequence s, the intensity I s is calculated us- 
ing the molecular-based model, eq. J2J), with the exten- 
sion just described. The parameters entering this equa- 
tion are the stacking free energies given in Table [I] as 
well as the parameter a s reflecting the reduction of the 
total concentration of free targets in solution; this lat- 
ter (sequence-dependent) parameter uses the RNA/RNA 
binding free energies for duplex formation in solution, 
taken from Ref. [4J . The modifications in the molecular- 
based model as compared to the model in Ref.0, as 
well as the different choice of free energy parameters 
(AG45 vs AG37) require a refitting of the effective inverse 
temperature /?' and a concentration Co, which yielded 
f3' = 0.6(kcal/mole)- 1 and c = e^', with e = 42 
kcal/mole. The fitting procedure for these two param- 
eters follows the procedure of [l7j . 

In the experimental Affymetrix data set, the average 
intensity is around 3% of the maximal intensity. In all 
our simulations, we adjusted the temperature to repro- 
duce this average intensity. The resulting temperatures 
range from 494 K to 550 K. There is still a gap between 
the experimental temperature of 318 K, but including the 
effects mentioned above has significantly decreased this 
gap in the original molecular-based model, where the ef- 
fective temperature was 700 K [13; in turn the latter 
model had already a much more realistic effective tem- 
perature than the Held model where the effective tem- 
perature exceeded 2000 K To obtain the effective 
affinities An associated to the molecular-based model, we 
minimize the difference between the intensity I s as pre- 
dicted by the molecular model in eq. and the intensity 
I e resulting from the effective affinities and given by 

ln(I e ) = Y / SuA u -\n(c s ), (7) 

li 

in analogy to eq. 0. More precisely, the effective affini- 
ties An result from a minimization of the sum over all 
100,000 sequences of the the squared difference between 
the logarithm of the intensity I s and the logarithm of the 
intensity I e resulting from the effective affinities. 

The first data set comprises a simple two-state model, 
in which a target is either free in solution, or fully bound 
to a probe. Hybridization in solution is not taken into 
account, i.e., a s = 1. The results are shown in Fig- 
ure a). The effective affinities do not depend on the 
position, apart from the two edge nucleotides which en- 
ter in only one pair of neighboring base pairs. (See 
eq. 101). Note that the affinities increase with the or- 
dering A < T < G < C, as expected from the values of 
the free energies of Table H] 

Next, the hybridization in solution is taken into ac- 
count by using two extra parameters /?' and cq which have 
the values of (3' = 0.6 (kcal/mole) -1 and c = e e/3 with 
e = 42 kcal/mole, respectively. Still the effective affinities 



are not position dependent, see Figure [Sfb). However, 
the order of the curves has changed: A < G < T < C. 

Fig. Etc) shows the effective affinities when polydisper- 
sity of the probe length distribution and the effect that a 
duplex can zip open has been taking into account. These 
two effects lead to position-dependent effective affinities. 
The effect on the side of the microarray surface is larger 
than that on the solution side. Furthermore, the effective 
affinities of G en T have become more alike. 

The last panel of Figure [5] shows the effective affinities 
when also the effect of noise, entropy of the tails, and 
the fact that only U and C carry fluorescent labels are 
taken into account. The biggest effect is that the effective 
temperature is lowered. Furthermore, the sequence has 
become A < G ~ T < C, in agreement with the order of 
effective affinities observed in experiment (see Figure. |5| . 
Note also that the scale of amplitudes of the effective 
affinities ranges from about —0.2 to 0.2 (see Fig. [!Jc- 
d)). This is fully consistent with the values obtained in 
Section HIl 



IV. CONCLUSIONS 

In this paper we have analyzed the relation between the 
effective affinities as originally introduced by Naef and 
Magnasco [3j and those obtained from a molecular-based 
model |17|. which uses hybridization free energies in so- 
lution. We show that these two models yield very similar 
effective affinities. This implies that free energies in so- 
lution are adequate parameters to describe hybridization 
in Affymetrix microarrays, at least if an effective temper- 
ature is used. 

Firstly, the fact that the effective affinity for G is lower 
than that for C and that the affinity for A is lower than 
that for T is consistent with hybridization data in so- 
lution, as pointed out in Refs. 0,H3- Here, we have 
shown the role of target-target hybridization in solution, 
which in the molecular-based approach is described 
by a parameter a (see eq. @)). The effect of a is of dimin- 
ishing the differences in the effective affinities between 
different nucleotides so to make the effective affinities for 
G and T almost "degenerate" (see Fig. |5J). This is con- 
sistent with the data of Naef and Magnasco 3] , Binder 
and Preibisch 0] and our results of Sec. [H] The basic 
physics behind this effect is quite clear. The small differ- 
ence between the effective affinities for G and T, in spite 
of the large difference in binding free energies in solu- 
tion between these two nucleotides, is caused by the fact 
that G-rich sequences tend to hybridize strongly in so- 
lution, thereby diminishing their concentration available 
for binding to the probes. 

We note that the calculation of the previous section 
yield effective affinities which are position-dependent, 
mostly caused by the ability of the probe-target complex 
to partly open up at the ends. To a lesser extent, also 
the target-surface repulsion and the polydispersity of the 
probes play a role. The profiles of the effective affinities 
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calculated in Sec. [H]are somewhat smoother than those 
deduced from the molecular-based model. This differ- 
ence is however small. The most important aspect of 
our analysis is however that the molecular-based model 
1) reproduces the degeneracy between the affinities of G 
and T 2) yield amplitudes for the affinities quantitatively 
close to those calculated in Sec. ITT1 

We finally comment on other possible ways of link- 
ing effective affinities to hybridization free energies ob- 
tained from melting experiments in solution. A recent 
study Ref. attributed the differences between the 

two quantities to the effect of biotin molecules on the 
binding. This is an alternative point of view compared to 
our approach which emphasizes instead the effect of hy- 
bridization in solution between partially complementary 
single stranded RNA molecules. In this respect it would 
be interesting if measurements of melting temperatures 
experiments of biotinilated RNA and DNA duplexes in 
solution similar to that of Ref. [5( could be performed. 
These experiments would allow to quantify the effect of 
biotin on binding. To our knowledge such experiments 
have not yet been performed. 
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FIG. 6: The fraction of paths originating in r = (0,0,2) and 
never crossing the plane z — can be found with the method 
of images: the number of paths crossing the plane and ending 
in r ' is equal to the total number of paths starting from — r 
and ending in r ' . 



tribution that a segment of N Kuhn steps extends to a 
distance r from its origin is given by a Gaussian distri- 
bution: 

Tif - N ^ (2^)'" < A1 » 

To determine the number of polymers starting from a 
height z above the surface and not crossing the wall, 
we use the method of mirror images. Using the same 
configuration as in Figure the fraction of walks of 
length N originating from r = (0, 0, z) and terminating 
at r ' = (0, 0, z') is equal to T(r 1 — r,N). A part of these 
cross the wall. This fraction is equal to T(f ' + r, A), i.e. 
the number of walks originating in — r and terminating 
in r 1 . Therefore the fraction of walks of total length N 
starting in f and terminating in r ' and which do not 
cross the wall is given by the difference: 

e AS„m/R _ f df'[T(j"-f,N)-T(f' + f,N)] 

Jz'>0 

= Erf (V5)' < A2 > 

where Erf(x) denotes the error function defined as 

Erf(x) = 4= / e"' 2 dt. (A3) 
V* Jo 

We recall that the Kuhn length is related to the persis- 
tence length as b = 2l p and that for single stranded DNA 
l p w 5 bp. 

We sum next over all possible tail lengths. Before hy- 
bridization the target molecules are fragmented at ran- 
dom locations, with an average fragment length of about 
50 bp. We find thus: 



APPENDIX A: ENTROPIC REPULSION 
BETWEEN SUBSTRATE AND TARGET TAIL 

We model the single-stranded DNA segment as a freely 
jointed chain with Kuhn length b. The probability dis- 



AS(m)/R _ 



(1-7) 



..N 



2^ 7 

N=0 



Erf 



m + mo / 3 
10 V 2N 



, (A4) 



in which 7 = 49/50 is the probability for chain continu- 
ation, and mo is the ratio of the spacer distance and the 
length of a single base pair. 
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